Incorporating Prosodic Boundaries in Unsupervised Term Discovery

نویسندگان

  • Bogdan Ludusan
  • Guillaume Gravier
  • Emmanuel Dupoux
چکیده

We present a preliminary investigation on the usefulness of prosodic boundaries for unsupervised term discovery (UTD). Studies in language acquisition show that infants use prosodic boundaries to segment continuous speech into word-like units. We evaluate whether such a strategy could also help UTD algorithms. Running a previously published UTD algorithm (MODIS) on a corpus of prosodically annotated English broadcast news revealed that many discovered terms straddle prosodic boundaries. We then implemented two variants of this algorithm: one that discards straddling items and one that truncates them to the nearest boundary (either prosodic or pause marker). Both algorithms showed a better term matching Fscore compared to the baseline and higher level prosodic boundaries were found to be better than lower level boundaries or pause markers. In addition, we observed that the truncation algorithm, but not the discard algorithm, increased word boundary F-score over the baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The role of prosodic boundaries in word discovery: Evidence from a computational model.

This study aims to quantify the role of prosodic boundaries in early language acquisition using a computational modeling approach. A spoken term discovery system that models early word learning was used with and without a prosodic component on speech corpora of English, Spanish, and Japanese. The results showed that prosodic information induces a consistent improvement both in the alignment of ...

متن کامل

Prosodic boundary information helps unsupervised word segmentation

It is well known that prosodic information is used by infants in early language acquisition. In particular, prosodic boundaries have been shown to help infants with sentence and wordlevel segmentation. In this study, we extend an unsupervised method for word segmentation to include information about prosodic boundaries. The boundary information used was either derived from oracle data (handanno...

متن کامل

A multilingual study on intensity as a cue for marking prosodic boundaries

Speech intensity is one of the main prosodic cues, playing a role in most of the suprasegmental phenomena. Despite this, its contribution to the signalling of prosodic hierarchy is still relatively understudied, compared to the other cues, like duration or fundamental frequency. We present here an investigation on the role of intensity in prosodic boundary detection in four different languages,...

متن کامل

Unsupervised Extraction of Prosodic Structure

Our approach for unsupervised extraction of prosodic structure in spontaneous speech consists of the four steps: chunking into interpausal units, syllable nucleus extraction, prosodic boundary detection, and pitch accent detection. The extraction is based on acoustic features derived from F0 parameterization, and on energy and segment duration features. Phrase boundaries and accents are detecte...

متن کامل

Unsupervised Syntactic Chunking with Acoustic Cues: Computational Models for Prosodic Bootstrapping

Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMMbased unsupervised chunker that learns from only transcribed words and raw acoustic correl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014